Skip to content

CastleInc/Embedding-Service

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 

Repository files navigation

  • Embedding Generation: Generate embeddings using Triton Nomic Embedding model
  • RAG Support: Retrieval-Augmented Generation with Elasticsearch vector stores
  • Hybrid Retrieval: Query fusion retriever combining multiple data sources
  • LLM Reranking: Rerank retrieved documents using LLM
  • MongoDB Integration: Store and retrieve prompts and configurations
  • Health Checks: Monitor service status and component availability
  • Async Support: Asynchronous operations for better performance

Architecture

embedding-service/
├── app/
│   ├── app.py                      # Application entry point
│   ├── single_index.py             # Main FastAPI application
│   ├── triton_nomic_embedding.py   # Triton embedding client
│   └── main.py                     # (unused)
├── mongo/
│   └── mongodbservice.py           # MongoDB service and repositories
├── requirements.txt                # Python dependencies
├── .env.example                    # Environment variables template
└── README.md                       # This file

Installation

  1. Clone the repository

    cd /Users/vishan/PycharmProjects/Embedding-Service
  2. Create a virtual environment

    python -m venv .venv
    source .venv/bin/activate  # On macOS/Linux
  3. Install dependencies

    pip install -r requirements.txt
  4. Configure environment variables

    cp .env.example .env
    # Edit .env with your configuration

Configuration

Edit the .env file with your settings:

# Server
PORT=8000
HOST=0.0.0.0

# LLM Configuration
LLM_MODEL_NAME=llama-2-13b-chat
LLM_HOST=http://localhost:8000/v1
LLM_API_KEY=your-api-key

# Embedding Model
EMBEDDING_MODEL_NAME=nomic-ai_nomic-embed-text-v1.5-ensemble
EMBEDDING_API_BASE=http://localhost:8000

# MongoDB
MONGO_HOST=localhost
MONGO_USERNAME=admin
MONGO_PASSWORD=password

# Elasticsearch
ES_HOST=localhost
ES_USER=elastic
ES_PASSWORD=password
SECURITY_REPORT_INDEX_NAME=security_reports
CVE_INDEX_NAME=cve_data

# Retrieval Settings
TOP_K_AFTER_RERANK=5
SIMILARITY_TOP_K=10

Running the Service

Method 1: Using app.py (Recommended)

cd app
python app.py

Method 2: Using uvicorn directly

cd app
uvicorn single_index:app --host 0.0.0.0 --port 8000

Method 3: Using the module's main

cd app
python single_index.py

API Endpoints

1. Root Endpoint

GET /

Returns service information and available endpoints.

Response:

{
  "service": "Embedding Service API",
  "version": "1.0.0",
  "status": "running",
  "endpoints": {
    "health": "/health",
    "embeddings": "/v1/embeddings",
    "prompt": "/v1/prompt",
    "retrieve": "/v1/retrieve"
  }
}

2. Health Check

GET /health

Check service health and component status.

Response:

{
  "status": "healthy",
  "embedding_model": true,
  "vector_stores": true,
  "mongodb": true
}

3. Generate Embeddings

POST /v1/embeddings

Generate embeddings for provided texts.

Request:

{
  "texts": ["Hello world", "How are you?"]
}

Response:

{
  "embeddings": [[0.1, 0.2, ...], [0.3, 0.4, ...]],
  "model": "nomic-ai_nomic-embed-text-v1.5-ensemble",
  "dimensions": 768
}

4. Retrieve Documents

POST /v1/retrieve

Retrieve relevant documents without generating a response.

Request:

{
  "query": "What are the security vulnerabilities?",
  "summary": "Optional summary"
}

Response:

{
  "query": "What are the security vulnerabilities?",
  "documents": [
    {
      "page": 1,
      "file_path": "/path/to/file.pdf",
      "file_name": "security_report.pdf",
      "score": 0.95,
      "text": "Document text...",
      "type": "pdf",
      "others": {}
    }
  ],
  "count": 1,
  "has_context": true
}

5. RAG Prompt Generation

POST /v1/prompt

Generate a RAG-enhanced prompt with retrieved context.

Request:

{
  "query": "Explain the CVE-2023-1234",
  "summary": "Optional summary"
}

Response:

{
  "response": "Context 1: ...\nContext 2: ...",
  "metadata_list": [...],
  "prompt": "Formatted prompt with context",
  "system_message": "System prompt",
  "has_context": true,
  "retrievers_list": ["security_reports", "cve_store"]
}

Testing the API

Using curl

# Health check
curl http://localhost:8000/health

# Generate embeddings
curl -X POST http://localhost:8000/v1/embeddings \
  -H "Content-Type: application/json" \
  -d '{"texts": ["Hello world", "Test embedding"]}'

# Retrieve documents
curl -X POST http://localhost:8000/v1/retrieve \
  -H "Content-Type: application/json" \
  -d '{"query": "security vulnerabilities"}'

Using Python

import requests

# Generate embeddings
response = requests.post(
    "http://localhost:8000/v1/embeddings",
    json={"texts": ["Hello world", "Test embedding"]}
)
print(response.json())

# Retrieve documents
response = requests.post(
    "http://localhost:8000/v1/retrieve",
    json={"query": "security vulnerabilities"}
)
print(response.json())

Components

Triton Nomic Embedding

  • Connects to Triton Inference Server
  • Supports batch processing
  • Handles base64 encoding for text inputs
  • Applies L2 normalization and mean pooling

Vector Stores

  • Elasticsearch: Primary vector store for document retrieval
  • Hybrid Retrieval: Combines multiple retrievers using reciprocal rank fusion
  • LLM Reranking: Uses LLM to rerank retrieved documents

MongoDB

  • Stores prompts and configurations
  • Singleton pattern for connection pooling
  • Automatic reconnection handling

Error Handling

The service provides graceful degradation:

  • If Elasticsearch is unavailable, embeddings-only mode is enabled
  • If MongoDB is unavailable, uses default prompts
  • All endpoints return proper HTTP status codes and error messages

Development

Running in Development Mode

cd app
uvicorn single_index:app --reload --host 0.0.0.0 --port 8000

Checking for Errors

# Check Python syntax
python -m py_compile app/single_index.py

# Run with debug logging
LOG_LEVEL=DEBUG python app/app.py

Deployment

Using systemd (Linux)

Create /etc/systemd/system/embedding-service.service:

[Unit]
Description=Embedding Service API
After=network.target

[Service]
Type=simple
User=your-user
WorkingDirectory=/path/to/Embedding-Service/app
Environment="PATH=/path/to/.venv/bin"
ExecStart=/path/to/.venv/bin/python app.py
Restart=always

[Install]
WantedBy=multi-user.target

Then:

sudo systemctl daemon-reload
sudo systemctl enable embedding-service
sudo systemctl start embedding-service

Troubleshooting

Import Errors

If you encounter import errors with MongoDB:

# The service uses sys.path.append to handle imports
# Make sure you're running from the correct directory

Connection Issues

  • Verify Triton server is running and accessible
  • Check Elasticsearch cluster status
  • Verify MongoDB connection string

Performance

  • Adjust max_batch_size in TritonNomicEmbedding for better throughput
  • Tune SIMILARITY_TOP_K and TOP_K_AFTER_RERANK for retrieval quality
  • Use connection pooling for MongoDB (already configured)

Embedding Service API

A comprehensive FastAPI-based embedding service with RAG (Retrieval-Augmented Generation) capabilities, supporting Triton inference server, Elasticsearch vector stores, and MongoDB for prompt management.

Features

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages